-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FileService as a standalone microservice, LakeFS+S3 as dataset storage #3296
base: master
Are you sure you want to change the base?
Conversation
5db607e
to
0e7a9d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since installing LakeFS and Minio can be complex, could we add a frontend flag that allows developers to enable the user system without requiring LakeFS and Minio? This would let developers read files directly from their local file system when the user system is enabled.
6d101a1
to
da96a2a
Compare
OK. I have added a flag in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Tested on both Windows and Mac. The setup is very smooth.
Please add more details on step3. For example, how can developers migrate to this from current master.
core/file-service/src/main/scala/edu/uci/ics/texera/service/util/S3StorageClient.scala
Outdated
Show resolved
Hide resolved
core/file-service/src/main/scala/edu/uci/ics/texera/service/util/S3StorageClient.scala
Show resolved
Hide resolved
core/file-service/src/main/scala/edu/uci/ics/texera/service/util/S3StorageClient.scala
Show resolved
Hide resolved
core/amber/src/main/scala/edu/uci/ics/texera/web/service/ResultExportService.scala
Show resolved
Hide resolved
core/file-service/src/main/scala/edu/uci/ics/texera/service/FileServiceConfiguration.scala
Show resolved
Hide resolved
core/gui/src/app/dashboard/component/user/list-item/list-item.component.html
Show resolved
Hide resolved
...pp/dashboard/component/user/user-dataset/user-dataset-explorer/dataset-detail.component.html
Show resolved
Hide resolved
...pp/dashboard/component/user/user-dataset/user-dataset-explorer/dataset-detail.component.scss
Outdated
Show resolved
Hide resolved
07d4789
to
2e38fad
Compare
2e38fad
to
6accd5b
Compare
This PR introduces the FileService as another microservice parallel to WorkflowCompilingService, ComputingUnitMaster/Worker, and TexeraWebApplication.
Purpose of the FileService
LakeFS
+S3
, LakeFS for the version control metadata and S3 for data transfer; ButLakeFS
doesn't have access control layerFileService
, providingArchitecture before and after adding FileService
Before:

After

Key Changes
FileService
is introduced. All the dataset-related endpoints are hosted onFileService
storage-config.yaml
ComputingUnitMaster
andComputingUnitWorker
, they will callFileService
to read files, during which their access will be verified. So in the dynamic computing architecture (which will be introduced in Add computing unit manager service #3298), they will send requests along with current user's token. In single-machine architecture, they are bypassing the network requests by doing direct local function calls.You may refer to
core/amber/src/main/python/pytexera/storage/dataset_file_document.py
for implementation details. This feature is only available in the dynamic computing architecture.How to migrate the previous datasets to the new datasets managed by the LakeFS
As we did quite some refactoring, two dataset implementations are NOT compatible with each others. To migrate the previous datasets to the latest implementation, you will need to re-upload the data via the new UI.
How to deploy new architecture
Step1. Deploy LakeFS & Minio
Use Docker (Highly recommended for local development)
core/file-service/src/main/resources
docker-compose --profile local-lakefs up -d
at its directoryUse Binary (Recommended for production deployment)
Refer to https://docs.lakefs.io/howto/deploy/
Step2. Configure the
storage-config.yaml
Configure the below section in the
storage-config.yaml
:Here is the configuration you can directly use if you are using the
core/file-service/src/main/resources/docker-compose.yml
to install LakeFS & Minio:Step3. Launch services
Launch
FileService
, in addition toTexeraWebApplication
,WorkflowCompilingService
andComputingUnitMaster
.Future PRs after this one
amber
package.